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PATENT 

Attorney Docket No.: 0001-1US 

METHOD AND SYSTEM FOR THE MULTIDIMENSIONAL 
MORPHOLOGICAL RECONSTRUCTION OF GENOME EXPRESSION 

ACTIVITY 

CROSS-REFERENCES TO RELATED APPLICATIONS 
[01] This application is a continuation of and claims the benefit of U.S. 

Provisional Application No. 60/ , by Doyle et al. entitled, METHOD AND 

SYSTEM FOR THE MULTIDIMENSIONAL MORPHOLOGICAL RECONSTRUCTION 
OF GENOME EXPRESSION ACTIVITY filed July 28, 2000, the disclosure of which is 
incorporated herein by reference. 

BACKGROUND OF THE INVENTION 
[02] Genome sequencing 

[03] Although biological science finds its roots in a grand tradition of 
exploratory investigation, for many years, basic research in biology and medicine has focused 
on a constructionist approach. With the advent of powerful manipulative techniques in 
molecular biology, most researchers in recent decades have focused on constructing new 
biological "scenarios" rather than merely observing existing systems. They have done this by 
perturbing various parameters of otherwise naturally-occurring systems and observing the 
effect on system dynamics, functional characteristics, etc. 

[04] The federally sponsored Human Genome Project (HGP) has recently 
re-legitimized the exploratory approach for life scientists. The new availability of complete 
genome sequence information for a variety of species has motivated many large new projects 
focused entirely on "mining" these data in order to learn more about the basic functions of 
biological structures and their development through time. 

[05] Early progress in the HGP took a directed approach. The federally 
funded sequencing centers concentrated on the targeted sequencing of specific important 
genes, working out the gene sequence from start to finish. This approach promised a long 
and difficult road to completing the entire genome. 

[06] Craig Venter, a former NIH researcher, advocated taking a different 
approach. His idea was rather to take the approach of splitting up the entire genome into 



small fragments and working on them en masse. This involved dividing the sequencing task 
among many automatic sequencing machines and attacking the task in parallel, with large 
numbers of short sequences being determined, and then proceeding to process more batches 
of the short fragments. Computer scientists then proceeded to reconstruct the fragments' 
proper order using algorithmic overlap-analysis methods first proposed by Leroy Hood. This 
method became called "shotgun sequencing" and although persistently derided by the 
established authorities in the HGP, it proved to be extremely effective in making rapid 
progress toward the goal of sequencing an entire genome. This work led to the joint 
announcement on June 26, 2000 by Craig J. Venter, president of Celera Genomics 
(http:///www.celera.com), and National Human Genome Research Institute director Francis 
S. Collins of completion of "the first survey of the entire human genome." The "survey" is 
the "working draft" of the human genome produced by the publicly funded international 
consortium HGP and the "first assembly of the human genome" produced by privately 
funded Celera Genomics. 

[07] With the sequencing of the genome nearly complete, the major focus 
of research is changing. Since gene sequences code for amino acids, the basic building 
blocks for proteins, many molecular biologists feel that the best place to focus is on creating 
large libraries of the specific proteins that are coded for by the known genes in the genomic 
sequence. This field of research is referred to as proteomics [Pandey, A. and M. Mann, 
Nature, 405(6788):837-46 (2000)]. Other scientists are focused on the task of computational 
prediction of the 3-dimensional structure of protein molecules directly through analysis of the 
primary genomic sequences. This area of work is called structural genomics. 

[08] Still other scientists, recognizing that the ultimate goal for most life 
scientists is understanding biological function in normal and diseased states, are focusing 
more directly on the task of attempting to find specific correlations between gene systems and 
phenotypic patterns, linking gene sequences directly to clinically-relevant effects. This work 
is part of what is called functional genomics [Eisenberg, D., et ah, Nature, 405(6788):823-6 
(2000)]. Functional genomics begins with all available sequence information in pursuit of 
biological understanding [Lockhart, D. J. and E.A. Winzeler, Nature, 405(6788):827-36 
(2000)]. 

[09] A primary focus of functional genomics is gene expression analysis. 
This involves the use of a variety of techniques to detect the presence of mRNA sequences 
within specific tissues. This is done by taking advantage of an effect first observed by 
Southern, that of the tendency of free nucleotide sequence fragments to hybridize with their 



complementary mates (see [Southern, E. et al., Nat Genet, 21(1 Suppl):5-9 (1999)] for a 
recent review). By attaching these sequence fragments to solid supports, and by taking 
advantage of the binding of various marker molecules to solubilized mRNA, researchers are 
able to image specific gene expression activity. 

[10] cDNA Microarrays 

[11] Since that early work, DNA hybridization technology took a 
tremendous leap forward when the ability was provided to screen a broad spectrum of gene 
messages at once, through the use of cDNA microarrays [Eisen, M.B. and P.O. Brown, 
Methods Enzymol, 303:179-205 (1999); Brown, P.O. and D. Botstein, Nat Genet, 21(1 
Suppl):33-7 (1999); Cheung, V.G., et al., Nat Genet, 21(1 Suppl):15-9 (1999)]. "Gene chips" 
consist of a solid support to which is attached a regular array of DNA fragments. They are 
generally created through the use of a robotic system, which coordinates the laying down of a 
"raster" grid of the DNA probe fragments. The robot deposits this regular grid of pre- 
determined DNA sequence "spots" onto a fixed substrate, such as a specially-coated glass 
slide. 

[12] These broad-spectrum cDNA chips are organized so that a wide 
assortment of probes are arrayed in a geometric grid layout, so that the x,y grid coordinate of 
the grid can be used by a computer system to keep track of which probe is at each location. 

[13] The basic steps of a typical microarray analysis is as follows: 1) The 
tissue to be studied is selected and prepared for RNA extraction. This typically involves 
homogenization of the tissue to free into solution the desired macromolecules. 2) The mRNA 
is extracted using standard techniques and then is subjected to reverse transcription in order 
to produce complementary strands of cDNA molecules. 3) The cDNA molecules are usually 
synthesized using labeled nucleotides. Use of different labels allows for easy comparison of 
different mRNA populations. 4) The cDNA probes are then tested by hybridizing them to a 
DNA microarray. Arrays with more than 250,000 oligonucleotides or 10,000 different 
cDNAs per square centimeter can now be mass-produced [Lockhart, D. J. and E.A. Winzeler, 
Nature, 405(6788):827-36 (2000)]. 5) Finally, computer-based image acquisition, processing 
and analysis is used to quantitate the strength of fluorescent signal at each of the microarray 
grid locations, thereby providing evidence of the presence and concentration of mRNA 
corresponding to each of the genes associated with the microarray chip. 

[14] Laser Capture Microdissection 

[15] Since the gene expression activity of organs and tissues can be quite 
complex, it is desirable to use a technique which allows analysis of the gene expression, but 



which permits the morphologic localization of the area to be studied, thus avoiding the loss of 
morphological detail that results from the homogenization process. Laser capture 
microdissection (LCM) allows this to be done with great specificity [Bonner, R.F., et al., 
Science, 278(5342):1481, i483 (1997); Cole, K.A. et al., Nat Genet, 21(1 Suppl):38-41 
(1999); Emmert-Buck, M.R., et al., Science, 274(5289):998-1001 (1996)] 
(http^mecko.nichd.nih.gov/lcm/Icm.htm). 

[16] Microdissection-based gene expression analysis begins with the use of 
a nonaldehyde fixation of the tissue to be studied, using a fixative such as 70% ethanol, since 
aldehyde fixatives disrupt RNA structure. A low-temperature embedding medium, such as 
polyethylene glycol distearate, is used to embed the tissue in preparation for histological 
sectioning. Thin tissue sections are cut, at a thickness of 8 jm, for example, and then are 
mounted on uncovered glass slides. A thin membrane is typically applied to the section 
surface to prevent cross-contamination of macromolecules. A UV laser is then used to 
perform cold ablation of thin lines of tissue, creating an incision around a specific area of the 
tissue section without disturbing surrounding tissue. A specialized adhesive carrier film is 
used to transfer the incised portion of the tissue section to an eppendorf microfiige tube with 
lysis buffer. The cells are lysed in the buffer and can be used for mRNA analysis. 

[17] 3D localization 

[18] The above microdissection technique has been used by Cole, et al. 
[Cole, K.A. et ah, Nat Genet, 21(1 Suppl):38-41 (1999)], to study the cellular-level gene 
expression activity associated with prostate cancer. These investigators used serial-section 
histological techniques to precisely identify and then excise specific tumor cells within the 
prostate gland for microarray analysis of expression activity. The investigators then 
interactively annotated 3D volume reconstructions of gland section images to overlay 
expression data relating to the specific cells that had been micro dissected. It should be noted 
that this study focused on only small groups of specific tissue areas, since the microdissection 
approach requires a skilled operator and is extremely exacting work. Tissue that isn't used 
for expression analysis is stained for anatomical reconstruction of the gland architecture, 
rendering it unusable for further expression analysis. Since this approach is targeted to 
specific areas of the tissue, it is most useful for specifically targeted studies, and is poorly 
suited for survey-based exploratory analysis. 

[19] Volumetric reconstruction is well known for the macroscopic-level 
medical imaging techniques of MRI and CT scanning. These 3-dimensional raster-imaging 
techniques provide useful volumetric surveys for specific anatomical features, but are 



typically suited for imaging specific sorts of biologic activity. In order to increase the 
usefulness of these methods, various researchers investigated the combination of multiple 
imaging modalities, such as MRI and PET scanning, in order to take advantage of the 
anatomical structure imaging features of the MRI approach, while exploiting the functional 
5 data yielded by the PET scanning approach. These multiple datasets are sometimes 

superimposed upon the same 3-dimensionai coordinate space in order to aid in visualization 
of the functional and structural details. 

[20] A similar capability can be provided at a microscopic histological 
level, through the use of multi-modal imaging of serial microscopic sections for 3D 
10 reconstruction and analysis. Alternating serial sections are placed on separate glass slides, 
with one set of alternating sections stained and coverslipped for histological detail, and the 
other set of adjacent alternating sections left uncovered for further processing. For each 
structure seen in a stained coverslipped section, the adjacent section could be easily processed 
using other techniques. This method is described in detail in Doyle [Doyle, M.D., The 
15 intraorgan lymphatic system of the rat left ventricle in normalcy and aging, Univ. of Illinois 
at Urbana-Champaign, University Microfilms, order number 9210786 (1991)], where it was 
used to coordinate light microscopic and electron microscopic examination of the three- 
dimensional aspects of tissue specimens. 

[21] Various tools are available for the interactive volume visualization of 
20 3-D biomedical image data. One example is given by the MultiVIS client-server Internet- 
based distributed visualization system developed by Doyle, et al. [Doyle, M. et al., The 
N. Visible Embyro Project: A Platform for Spatial Genomics, in 28th AIPR Workshop: 3D 

Visualization for Data Exploration and Decision Making (2000); Doyle, M., et al., MultiVIS: 
A Web-based interactive remote visualization environment and navigable volume imagemap 
25 system, in 28th AJPR Workshop: 3D Visualization for Data Exploration and Decision 

Making (2000)] The MultiVIS system also is a good example of a system which allows for 
the mapping of both volume image data and other types of data, such as object identity 
information, onto a single x,y,z coordinate space. This system has been used for a variety of 
purposes, such as for providing an interactive online 3-D atlas of the Visible Human Project 
30 male dataset [Doyle, M., et al., MultiVIS: A Web-based interactive remote visualization 
environment and navigable volume imagemap system, in 28th AJPR Workshop: 3D 
Visualization for Data Exploration and Decision Making (2000)]. All the references listed in 
this paragraph are hereby incorporated by reference for all purposes. 
[22] Unsolved pr blems 
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[23] Although the above-described existing technologies have enabled 
numerous advances in biomedical science and industry, there are several long-felt but 
unsolved needs for which a solution has not been obvious before the present invention. One 
need is to gather gene expression data in a manner that supports the types of exploratory 
5 research that can take advantage of the broad-spectrum types of biologic activity analysis 
enabled by today's microarray tools. Further, there is a serious need for methods to visualize 
the spatial distribution of the biologic activity of a wide range of genes, across a wide array of 
species and tissue types. There is a great need for technology to allow the collection of large 
volumes of these types of data, to enable exploratory investigations into patterns of biologic 
10 activity that may provide insights into both normal and abnormal biologic states. And there 
is certainly a need to correlate gene expression data with morphological structure in a useful 
and easy to understand manner, such as in a volume visualization environment. 

[24] Each of these needs is evident across all species and ages, however 
there is a particular need for these problems to be solved in order to enable researchers to 
1 5 make significant progress in the study of early development. Many breakthroughs in 
SS biomedical science will only occur through study of organism growth and development. 

Deciphering the delicate interplay between the spatial expression patterns of various genes 
and the timings of these biological events is among the most difficult of biomedical research 
questions. In order to solve such problems, tools are needed to allow the collection of larger 
20 volumes of expression data across a wider spectrum of gene types than ever before. 

BRIEF SUMMARY OF THE INVENTION 
[25] The present invention provides novel and useful methods and systems 
which help to solve these problems. A new field of work, which is enabled by the present 
invention, is called "spatial genomics." 
25 [26] According to one aspect of the present invention, a method and system 

for the multidimensional morphological reconstruction of tissue biological activity makes it 
possible for a biological tissue specimen to be imaged in multiple dimensions to allow 
morphological reconstruction. The same tissue specimen is physically sampled in a regular 
raster array, so that tissue samples are taken in a regular multidimensional matrix pattern 
30 across each of the dimensions of the tissue specimen. Each sample is isolated and coded so 
that it can be later correlated to the specific multidimensional raster array coordinates, 
thereby providing a correlation with the sample's original pre-sampling morphological 
location in the tissue specimen. Each tissue sample isolate is then analyzed with broad- 
spectrum biological activity methods, providing information on a multitude of biologic 
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functional characteristics for that sample. The resultant raster-based biological characteristic 
data may then be spatially mapped onto the original multidimensional morphological matrix 
of image data. 

[27] According to another aspect of the invention, various types of analysis 
may then be performed on the resultant correlated multidimensional spatial datasets. 

[28] Other features and advantages of the invention will be apparent in view 
of the following detailed description and appended drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[29] Fig. 1 is a flowchart illustrating a preferred embodiment of the 

invention; and 

[30] Fig. 2 is a diagram depicting the application of an embodiment of the 
invention to rasterize embryo tissue. 

DETAILED DESCRIPTION OF THE INVENTION 
[31] A specific embodiment of the invention can be used for the study of 
gene expression analysis as described below. 

[32] 1) Morphological Imaging 

[33] Biological tissue is processed for histological sectioning, using the 
non-aldehyde fixation method (70% ethanol) and low-temperature embedding medium as 
described in Cole, et al. [Cole, K.A. et al., Nat Genet, 21(1 Suppl):38-41 (1999)] 
Histological thin section are then cut, at a thickness of 8 jm, from the embedded tissue, 
producing two sets of alternating serial sections, as described in Doyle [Doyle, M.D., The 
intraorgan lymphatic system of the rat left ventricle in normalcy and aging, Univ. of Illinois 
at Urbana-Champaign, University Microfilms, order number 9210786 (1991)], with one set 
being histologically-stained for morphological detail and coverslipped for light microscopy. 
The other set is mounted on glass slides and left unstained with no coverslips, with a 
microdissection membrane to prevent cross-contamination of macromolecules (see 
http://mecko.nichd.nih.gOv/lcm/LCMTAP.htm#Laser Transfer and http://www.sl- 
microtest.com/MICRO/m_04_e.htm for detailed protocols.) 

[34] 2) Tissue rasterization 

[35] A UV laser of the type described in Cole, et al., [Cole, K.A. et al., Nat 
Genet, 21(1 Suppl):38-41 (1999)] is used to incise a grid pattern across each tissue section of 
the uncovered set of alternating serial sections described in #1 above. This is done with the 



use of said UV laser adapted to the application end of a microarray-creation robotic 
apparatus, as described in Cheung [Cheung, V.G., et al., Nat Genet, 21(1 Suppl):15-9 
(1999)]. This allows for unattended section incising of a large number of specimens. A 
second adaptation of the robotic apparatus [Cheung, V.G., et al., Nat Genet, 21(1 Suppl): 1 5-9 
(1999)] adds a microdissection-transfer film holder to the application end of the apparatus. 
This transfer film holder is then used to lift each incised section sample from each grid 
location on each section and transfer each sample to a uniquely-coded isolation tube for lysis 
and further processing. The sample isolation tubes are arranged in spatial arrays, where each 
tube is bar coded to indicate the x,y,z tissue-space coordinate of the original pre-sampling 
morphological matrix location of the sample. 

[36] 3) RNA amplification 

[37] The mRNA can be amplified [Phillips, J. and J.H. Eberwine, Methods, 
10(3):283-8 (1996)]. Amplification can also be done using PCR on the cDNA produced by 
reverse transcription of the mRNA. 

[38] 4) cDNA Microarray analysis 

[39] Each of the mRNA samples is then subjected to DNA microarray 
analysis [Eisen, M.B. and P.O. Brown, Methods Enzymol, 303:179-205 (1999)]. Reverse 
transcription is performed on each tissue sample isolate, in order to produce complementary 
strands of cDNA molecules. The cDNA can be labeled by using labeled nucleotides or the 
cDNA can be fluorescently labeled. The cDNA probes are then tested by hybridizing them to 
a DNA microarray. A preferred embodiment uses redundancy of probe locations as an 
internal control against solution inhomogeneity and other processing variations. Finally, 
computer-based image acquisition, processing and analysis is used to quantitate the strength 
of fluorescent signal at each of the microarray grid locations. 

[40] 5) Spatial Data Mapping 

[41] The gene expression data resulting from #4 are then spatially mapped 
onto the original multidimensional morphological matrix of image data. This is done by 
setting parameter bits in voxel data, to superimpose the expression message distribution upon 
the morphological volume image data. The volume image data is correlated with the x, y, z 
coordinates of the rasterized tissue samples so that tissue samples the locations of tissue 
samples are accurately located in the image data. This allows various types of analysis to be 
performed on the resultant correlated multidimensional spatial datasets. The details of 
implementing spatial mapping are well-known in the computer arts and not described in 
detail here. 
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[42] Some exemplary uses of the spatially mapped data will now be 
described. A researcher may desire information regarding mRNA synthesis at a particular 
location, expressed in x, y, z coordinates, of a tissue sample. A 3-dimensional view of the 
tissue would be displayed on the computer screen allowing the researcher to click on a voxel 
5 at the desired location. Techniques for creating an interactive 3-D volume visualization are 
described in the MultiVIS references described above. The mRNA synthesis data mapped to 
the voxel would be displayed in a variety of possible formats, e.g., as a table or a graph. 

[43] Alternatively, a researcher may desire information about the 
expression of a specific gene throughout the tissue sample. In this case, the gene expression 
1 0 data for each voxel is searched to determine whether the specific gene has been expressed. 
The display is the modified so that the three dimensional image is coded to show the 
O locations where the specific gene is expressed and, optionally, the relative amount of 

J* expression. 

^ [44] Most aspects of each of these elements of the invention can be 

y » 

SJ 15 completely automated, thereby allowing for large scale analysis of many tissue specimens. 

H 
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[45] 6) A Specific Example 
[46] A specific example illustrating the use and advantages of the above- 
described techniques will now be described. A human embryo 100 having a length of about 



PJ 5 mm is micro dissected. The z axis is defined along the dorsal axis and slices of about 8 

•l. r 

p 20 microns are prepared along the length of the z axis. As described above, alternating sets of 
H 8 serial slices are formed. Each slice from one of the sets is then micros dissected into squares 

of about 8 microns to rasterize the slice. Thus, voxels 104 in the form of 8 micron cubes are 
defined, each voxel labelled by its x, y, z coordinates. 

[47] The tissue in each voxel is then processed as described above to 
25 determine amount of mRNA expression for each tissue sample. This expression data for each 
voxel is then mapped to the coordinates of each voxel. 

[48] ALTERNATIVE EMBODIMENTS 

[49] Although the specific embodiment described above focuses on the 
study of gene expression activity, and uses a specific embodiment suited to that purpose, it 
30 will be clear to one with normal skill in the art that other types of biological activity can be 
studied using the method of the present invention and that many alternative embodiments are 
possible which conform to the structure and method of the present invention. 

[50] Various alternative embodiments of the present invention are possible 
without changing the fundamental nature of the system. These include, in part: 1) use of a 
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variety of other imaging methods, 2) use of other raster-based sampling methods, 3) use of 
other ways to isolate tissue samples, 4) use of other types of RNA amplification, such as 
modified PCR approaches or amplification of the cDNA 5) analysis of other types of biologic 
activity, such as proteins and other ligands, by monoclonal antibody binding, or any other 
types of local reactivity that can trigger a visible signal, 6) use of other types of broad 
spectrum macromolecular hybridization analysis, by microbead columns, for example, and 7) 
use of a variety of other types of data mapping and analysis. 

[51] The invention has now been described with reference to the preferred 
embodiments. Alternatives and substitutions will now be apparent to persons of skill in the 
art* For example, the dimensions and particular micro dissection techniques described above 
are not critical to the invention. Various types of computer systems and languages are 
suitable for use of the invention and implementation utilizing the Internet would be 
appropriate. Accordingly, it is not intended to limit the invention except as provided by the 
appended claims. 
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